Mean field conditions for coalescing random walks 



Roberto Imbuzeiro Oliveira* 
June 4, 2012 

Abstract 

The main results in this paper are about the full coalescence time C of a system of 
coalescing random walks over a finite graph G. Letting m(G) denote the mean meeting 
time of two such walkers, we give sufficient conditions under which IE [C] ~ 2m (G) 
and C/m(G) has approximately the same law as in the "mean field" setting of a large 
complete graph. One of our theorems is that mean field behavior occurs over all vertex- 
transitive graphs whose mixing times are much smaller than m(G); this nearly solves 
an open problem of Aldous and Fill and also generalizes results of Cox for discrete tori 
in d > 2 dimensions. Other results apply to non-reversible walks and also generalize 
previous theorems of Durrett and Cooper et al. Slight extensions of these results apply 
to voter model consensus times, which are related to coalescing random walks via 
duality. 

Our main proof ideas are a strenghtening of the usual approximation of hitting times 
by exponential random variables, which give results for non-stationary initial states; 
and a new general set of conditions under which we can prove that the hitting time of a 
union of sets behaves like a minimum of independent exponentials. In particular, this 
will, show that the first meeting time among k random walkers has mean ~ m{G)/ (2). 

1 Introduction 

Start a continuous-time random walk from each vertex of a finite, connected graph G. The 
walkers evolve independently, except that when two walkers meet - ie. lie on the same vertex 
at the same time -, they coalesce into one. One may easily show that there will almost surely 
be a finite time at which only one walk will remain in this system. The first such time is 
called the full coalescence time for G and is denoted by C. 

The main goal of this paper is to show that one can estimate the law of C for a large family 
of graphs G, and that this law only depends on G through a single rescaling parameter. More 
precisely, we will prove results of the following form: if the mixing time t^j^ of G (defined 
in Section [2]) is "small", then there exists a parameter m(G) > such that the law C/m(G) 
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takes a universal shape. Slight extensions of these results will be used to study the so-called 
voter model consensus time on G. 

The universal shape of Q/m{G) comes from a mean field computation over a large com- 
plete graph Kn- In this case the distribution of C can be computed exactly (cf. [H Chapter 
14]): 



{n-l)/2 



i=2 



where: 

Z2, Z3, Z4 . . . are independent and Vi > 2, t > : P (Z^ > t) = e'*^^) . (1) 

In words, C is a rescaled sum of independent exponential random variables with means 1/ (2) ; 
2<i<n. 

The scaling factor {n — l)/2 is the expected meeting time of two independent random 
walks over and we see that 



(n-l)/2 



> Z,- and -, — ; )■ 2 when n grows. 

^ {n-l)/2 ^ 



i>2 



This suggests the general problem we address in this paper: 

General problem: Given a graph G, let m(G) denote the expected meeting 
time of two independent random walks over G, both started from stationarity. 
Give sufficient conditons on G under which C has mean-field behavior, that is: 



Law (C/m(G)) ^ Law z}j 



and 



E[C] ^ m(G)E 



i>2 



2m(G). 



(2) 



(3) 



A version of this problem was posed in Aldous and Fill's 1994 draft [U Chapter 14] and 
much more recently by Aldous |2j. However, as far as we know there are only two families 
of examples the problem has been fully solved. Discrete tori G = (Z/mZ)'^ with with d >2 
fixed and m ^ 1 were considered in Cox's 1989 paper [7]. More recently. Cooper, Frieze 
and Radzik [6j proved mean field behavior in large random (i- regular graphs {d bounded). 
Partial results were also obtained by Durrett [HI [9] for certain models of large networks. 

We note that mean-field behaviour is not universal over all large graphs. One coun- 
terexample comes from a sequence of growing cycles, where the limiting law of C was also 
computed by Cox [?]. Stars with n vertices are also not mean field: C is lower bounded by 
the time the last edge of the star is crossed by some walker, which is about log?7,, whereas 
m(G) is unifmrly bounded. 
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1.1 Results for transitive, reversible chains 



Our results in this paper address ([2]) and ([3]) simultaneously by proving approximation 
bounds in Li Wasserstein distance, which implies closeness of first moments (cf. Section |2^ . 

The first theorem implies that mean field behaviour occurs whenever G is vertex-transitive 
and its mixing time (defined in Section [2]) is much smaller than m(G). This nearly solves a 
problem posed by Aldous and Fill in |4|, Chapter 14]. In their Open Problem 12, they ask 
for an analogous result with the relaxation time replacing the mixing time (more on this 
below) . 

The natural setting for this first theorem is that of walkers evolving according to the 
same reversible, transitive Markov chain (the definition of C easily generalizes to this case), 
where transitive means that for any two states x and y one can find a permutation of the 
state space mapping x to y and leaving the transition rates invariant. Clearly, random walk 
on a vertex-transitive graph is transitive in this sense. 



Notational convention 1 In this paper we will use "b 
there exist universal constants C, ^ > such that \a\ < = 



- O (a) " in the following sense: 
\b\ <C\a\. 



Theorem 1.1 (Mean field for transitive, reversible chains) Let Q be the (generator 
of a) transitive, reversible, irreducible Markov chain over a finite state space V, with mixing 
time t^iix- Define rr\{Q) to be the expected meeting time of two independent continuous-time 
random walks over V that evolve according to Q, when both are started from stationarity. 
Denote by C the full coalescence time for walks evolving according to Q. Finally, define 
{Zj}^ be as in (Q]). Then: 



dw I Law 



m(Q) 



Law 




O 



p{Q)\n 



1 



p{Q) 



1/6^ 



where: 



m(Q) 



and dw denotes Li Wasserstein distance. In particular. 



E\C] 



2 + 



p(Q) In 



piQ) 



l/6^ 



m(g). 



This result generalizes Cox's theorem [7] for (Z/mZ)'^ with d >2 and growing m. In this 
case, for any fixed d, the mixing time grows as whereas m{G) ~ m^lnm for = 2 and 



m(G) 



m 



for larger d. The original problem posed by Aldous and Fill remains open, but 



we note that: 



For transitive, reversible chains, the mixing time is at most a C In |V| factor away 
from the relaxation time, with C > universal (this is true whenever the stationary 
distribution is uniform). This means we are not too far off from a full solution; 
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• Any counterexample to their problem would have to come from a vertex-transitive 
graph with mixing time of the order of m{G) and relaxation time asymptotically smaller 
than the mixing time. To the best of our knowledge, such an object is not known to 
exist. 



1.2 Results for other chains 

We also have results on coalescing random walks evolving according to arbitrary generators Q 
on finite state spaces V. Again, we only require that the mixing time t^j^ of Q be sufficiently 
small relative to other parameters of the chain. 



Theorem 1.2 (Mean field for general Markov chains) LetQ denote (the generator of ) 
a mixing Markov chain over a finite set V, with unique stationary distribution ir. Denote 
by Qraax the maximum transition rate from any x G V and by TTmax the maximum stationary 
probability of an element of\. Let m{Q) denote the expected meeting time of two random 
walks evolving according to Q, both started from it. Finally, let C denote the full coalescence 
time of random walks evolving in V according to Q. Then: 



dw I Law 



In^ IVI 



1/6^ 



where 



a{Q) 



^1 ~l~ Q'max ^mix) ^rnax 



and dw again denotes L} Wasserstein distance. In particular, 



E[C] 



2 + 



a(Q)ln 



a{Q) 



In^ IVI 



m(g). 



We note that this theorem does not imply Theorem 11.11 for instance, it does not work for 
two-dimensional discrete tori. However, the well-known formula for vr over graphs gives the 
following corollary: 

Corollary 1.1 (Proof omitted) Assume G is a connected graph with vertex set V, where 
each vertex x G V has degree deg(j(x). Assume that e G (|V|~^, 1) is such that: 



max^gv degg(x) 
|V|-i E.evdegcW 



mix — 



e 


V 




In^ 


V|lnln|V| 



Then 



dw Law 



O 



e 1 + 



ln(l/e) 



nl/6> 



In ln|V| 

This corollary suffices to prove mean field behavior over a variety of examples, such as: 
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• all graphs with bounded ratio of maximal to average degree and O {\Y\/ |V|) mixing 
time: this includes expanders [B] and supercritical percolation clusters in (Z/LZ)'^ 

• all graphs with maximal degree < |V|^~'' (rj > fixed) and mixing time that is poly- 
logarithmic in |V|: this includes the giant component of a typical Erdos-Renyi graph 
Gn,d/n with d> 1 [ID] and the models of large networks considered by Durrett |H1 IS]- 

Let us briefly comment on the case of large networks. Durrett has estimated m(G) in these 
models, and has proven results similar to ours for a bounded number of walkers. We do not 
attempt to compute m(G) here, which in general is a model-specific parameter. However, we 
do show that mean field behavior for C follows from "generic" assumptions about networks 
that hold for many different models. This is important because recent measurements of real- 
life social networks [llj suggest that known models of large networks are very innacurate with 
respect to most network characteristics outside of degree distributions and conductance. In 
fairness, coalescing random walks and voter models over large networks are not particularly 
realistic either, but at the very least we know that mean field behavior is not an artifact of a 
particular class of models. We also observe that our Theorem 1 1 . 2 1 also works for non-reversible 
chains, eg. random walks on directed graphs. 

1.3 Results for the voter model 

The voter model is a very well-known process in the Interacting Particle Systems literature 
[13]. The configuration space for the voter model is the power set of functions ?7 : V — )■ (9, 
where V is some non-empty set and O is a. non-empty set of possible opinions. The evolution 
of the process is determined by numbers y) (x, ?/ G V, x 7^ y) and is informally described 
as follows: at rate q{x,y), node x copies y's opinion. That is, there is a transition at rate 
q{x,y) from any state 77 : V —t- (9 to the corresponding state rj^^y^ where: 

1^ i]{z), for all other z G V^\{x}. 

A classical duality result relates this voter model to a system of coalescing random walks 
with transition rates q{-,--) and corresponding generator Q. More precisely, suppose that 
V = {x{l), . . . ,x{n)} and that {Xt{i))t>o,i<i<n is a system of coalescing random walks 
evolving according to Q with Xo(i) = x{i) for each 1 < i < n. 

Duality: Choose rjo G . Then the configuration 

fit : x{t) G V ^ Vo(Xt{t)) G C {1 < t < n) 

has the same distribution as the state rjt of the voter model at time t, when the 
initial state is rjo. In particular, the consensus time for the voter model: 

r = inf{t > : Vz, J G V, r]t{i) = r]t{j)} 

satisfies E [r] < E [C] < +00. 
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Now assume that the initial state t]q G is random and that the random variables 
{Vo{^)}xev are iid and have common law /i which is not a point mass. In this case one can 
show via duality that the law of the consensus time r is that of Ckau, where K is a N- valued 
random variable independent of the coalescing random walks, defined by: 

K = min{i G N : f/j+i 7^ Ui}, where Ui, U2, . . . , are iid draws from /i 

and for each 1 < k < n 

Ck = min{t > : \{Xt{i) : l<i<n]\ = k]. 

Thus the key step in analyzing the voter model via our techniques is to prove approximations 
for the distribution of C^. Theorems 11.11 and 11.21 imply mean-field behavior for C = Ci. A 
quick inspection of the proofs reveals that the same bounds for Wasserstein distance can be 
obtained for for any 1 < k < n. It follows that: 

Theorem 1.3 (Proof omitted) Let V, O and fi be as above, and consider the voter model 
defined by V, O and by the generator Q corresponding to transition rates q{x,y). Assume 
that the sequence {Zj}j>2 is defined as in (J\), and also that K has the law described above 
and is independent from the Zj. Define p{Q) and a{Q) as in Theorems \1.1\ and Then 
the consensus time t for this voter model satisfies: 



dw Law — 7—— 1 , Law 
,m(Q) 




O ((p(g) ln(l/p(g)))^/^ 
if Q is reversible and transitive; 
0((a(g) ln(l/a(g))ln^|V|)'/' 
otherwise. 



1.4 Main proof ideas 

Our proofs of Theorems 11.11 and 11.21 both start from the formula ([1]) for the terms in the 
distribution of C over Kn- Crucially, each term Zj has a specific meaning: Zj is the time it 
takes for a system with i particles to evolve to a system with i — 1 particles, rescaled by the 
expected meeting time of two walkers. For i = 2, this is just the (rescaled) meeting time 
of a pair of particles, which is an exponential random variable with mean 1. For i > 2, we 
are looking at the first meeting time amoung (*) pairs of particles. It turns out that these 
pairwise meeting times are independent; since the minimum of k independent exponential 
random variables with mean fi is an exponential r.v. with mean fi/k, we deduce that Zj is 
exponential with mean 1/ (2) • 

The bulk of our proof consists of proving something similar for more general chains Q. 
Fix some such Q, with state space V, and let C, denote the time it takes for a system of 
coalescing random walks evolving according to Q to have i uncoalesced particles. Clearly, 
M = Ci — C2 is the meeting time of a pair of particles, which is the hitting time of the 
diagonal set: 

A = {{x,x) : X G V} 
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by the Markov chain Q^^-* given by a pair of independent reahzations of Q. More generally, 
]Vf {«+!) = Cj — Ci+i is the hitting time of 

A(*+i) = {(a;(l), . . . , a;(z + 1)) : 31 < < 12 < i + I, = x{i2)}. 

The mean- field picture suggests that each M^*"''^^ should be close in distribution to Zj. Indeed, 
it is known that: 

General principle: Let Ha be the hitting time of a subset A of states. If the 
mixing time t^^^ is small relative to E [Ha\, then Ha is approximately exponen- 
tially distributed. 

This is a general meta-result for small subsets of the state space of a Markov chain; precise 
versions (with different quantitative bounds) are proven in [3l [1] when the chain starts from 
the stationary distribution. However, we face a few difficulties when trying to use these 
off-the-shelf results. 

1. For each i, M^'^^^'^ is the first hitting time of A*^*+^) after time Cj+i. The random walkers 
are not stationary at this random time, so we need to "do" exponential approximation 
from non- stationary starting points. 

2. In order to get Wasserstein approximations, we need better control of the tail of M^'^^^'>] 

3. To prove that Zj and M^'^^^'> / m{Q) are close, we must show something like that 
E[M(^+i)] ^ E[M]/f+^), ie that M(*+i) behaves like the minimum of inde- 
pendent exponentials. 

4. Finally, we should not expect the exponential approximation to hold when A*^*"^^) is 
too large. That means that the "big bang" phase (to use Durrett's phrase) at the 
beginning of the process has to be controlled by other means. 

It turns out that we can deal with points 1 and 2 via a different kind exponential ap- 
proximation result, stated as Theorem 13.11 . This result will give bounds of the following 
form: 

P. (Ha > t) hi + o (D) exp (- ^ , (,))^ J { < ^) _ . 

(4) 

Notice that this holds even for non stationary starting points x if the chain started from x is 
unlikely to hit A before the mixing time. This is discussed in Section [3] below. We also take 
some time in that section to develop a specific notion of "near exponential random variable" . 
Although this takes up some space, we believe it provides a useful framework for tackling 
other problems. We note that a version of Theorem 13.11 for stationary initial states result is 
implicit in PQ. 

We now turn to point 3. The key difficulty in our setting is that, unlike Cox [7] or Cooper 
et al. [6], we do not have a good "local" description of the graphs under consideration which 
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we could use to compute E [M^^+i)] directly. We use instead a simple general idea, which we 
believe to be new, to address this point. Clearly, M*^*"*"^) is a minimum (*^^) hitting times. 
Let us consider the general problem of understanding the law of: 

Hr = min Hb where B = uf_i-Bj, 
i<i<e 

under the assumption that E [Hb^] = does not depend on i when the initial distribu- 
tion is stationary (this covers the case of M*^*"*"^^). Assume also that (jl]) holds for all 
A E {B, Bi, B2, . . . , Bi}. Then the following holds for e in a suitable range: 

\/AE{B,B^,B2,...,Bi} : F {H^ < eElH^]) ^ e. 

Morally speaking, this means that eE [Ha] is the e-quantile of Ha for all A as above; this is 
implicit in [T] and is made explicit in our own Theorem 13.11 Now apply this to A = B, with 
e replaced by e/x/E [Hb], and obtains: 

/ e 

e/i 



E[H 



FiHB<ef^)=F[\J{HB.^<efi} 



If we can show that the pairwise correlations between the events {Hb^ < e/x} are sufficiently 
small, then we may obtain: 



E[Hb] 



P (\J{Hb, < e/x} ] ^ J2^{Hb, < e/i) = ie, 

\i=l / i=l 



This gives: 



iB\ 



as if the times Hb^ , ■ ■ ■ , Hb^ were independent exponentials. The reasoning presented here 
is made rigorous and quantitative in Theorem 13.21 below. 

Finally, we need to take care of point 4, ie. the "big bang" phase. In the setting of The- 
orem II. 2[ we simply use our results on the coalescence times for smaller number of particles, 
which seems wasteful but is enough to prove our results. For the reversible/transitive case, 
we use a bound from [14j which is of the optimal order. Incidentally, the differences in the 
bounds of the two theorems come from this better bound for the big bang phase and from a 
more precise control of the correlations between meeting times of different pairs of walkers. 



1.5 Outline 

The remainder of the paper is organized as follows. Section |2] contains several preliminaries. 
Section |3] contains a general discussion of random variables with nearly exponential distri- 
bution and our general approximation results for hitting times. In Section H] we apply these 
results to the first meeting time among k particles, after proving some technical estimates. 
Section contains the formal definition of the coalescing random walks process and proves 
mean field behavior for a moderate initial number of walkers. Finally, Section [H] contains the 
proofs of Theorems 11.11 and ll. 21 Related results and open problems are discussed in the final 
sections. 
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2 Preliminaries 



2.1 Basic notation 

We write N for non-negative integers and [k] = {1, 2, . . . ,k} for any k G N\{0}. Given a set 
5*, we let 15*1 denote its cardinality. Moreover, for G N, we let: 



= {AcS : \A\ = k}. 



Notice that with this notation: 



15*1 finite 



\S\\ _ \S\\ 



k J k\{\S\-k)\' 

We will often speak of universal constants C > 0. These are numbers that do not depend 
on any of the parameters or mathematical objects under consideration in a given problem. 
We will also use the notation "a = O (6)" in the universal sense prescribed in Notational 
convention [TJ In this way we can write down expressions such as: 

e'' = i + b + (b^) and In (y^) = ^ + ^ (^^) = ^ (b) . 

Given a finite set S, we let Mi{S) denote the set of all probability measures over S. 
Given p,q E Mi{S), their total variation disance is defined as follows. 

dTY{p,q) = ^y]b(s) -q{s)\ = snp[p{A) - q{A)], 

where p{A) = X^aeA^*!*^)- -^^^ finite, Mi{S) will denote the set of all probability 

measures over the "natural" cr-field over S. For instance, for = M we consider the Borel 
(T- field, and for S = ©([0, +oo),V) (see Section [2.3.11 for a definition) we use the cr-field 
generated by projections. 

If X is a random variable taking values over S, we let Law (X) G Mi{S) denote the 
distribution (or law) of X. Here we again assume that there is a "natural" a-field to work 
with. 

2.2 Wasserstein distance 

The Li Wasserstein distance (or simply Wasserstein distance) is a metric over probability 
measures over M with finite first moments, given by: 



/ \\i{x,+oo]- \2{x,+oo]\dx (Ai,A2 G Mi(M)). 
Jr 



A classical duality result gives: 

(ipi/(Ai, A2) = sup < / f{x)Xi{dx)— I f{x)\2{dx) : / : M — )• M 1-Lipschitz. 
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Notational convention 2 Whenever we compute Wasserstein distances, we will assume 
that the distributions involved have first moments. This can be checked in each particular 
case. 

Remark 1 If Zi,Z2 are random variables, we sometimes write 

dw{Zi, Z2) instead of dwij^sm (Zi) , Law (^2))- 

Note that: 

dwiZi, Z2) = f |P (Zi > t) - P {Z2 > t) I dt. 
Jr 

Also notice that: 

\^[Zi]-E[Z2]\<dwiZ,,Z2). 
This is an equality if Zi > a.s. and Z2 = C Zi for some constant C > 0: 

yC eR, dw{Zi,C Zi) = \C -l\E[Zi], (5) 

since \f{C Zi) — f{Zi)\ < \C — l\Zi for every 1-Lipschitz function / : M — R. 

We note here three useful lemmas on Wasserstein distance. These are probably standard, 
but we could not find references for them, so we provide proofs for the latter two lemmas in 
Section |A] of the Appendix. The first lemma is immediate. 

Lemma 2.1 (Sum lemma for Wasserstein distance; proof omitted) For any two ran- 
dom variables X, Y with finite first moments and defined on the same probability space: 

dw{X,X + Y) <E[|F|]. 

For the next lemma, recall that, given two real- valued random variables X,Y, we say 
that X is stochastically dominated by Y, and write X y , if P (X > t) < P (F > t) for 
all t e M. 

Lemma 2.2 (Sandwich lemma for Wasserstein distance) Let Z, Z^, Z+,W be real-valued 
random variables with finite first moments and Z_ Z :<d Z^. Then: 

dw{Z, W) < dw{Z-, W) + dw{Z+, W). 

Lemma 2.3 (Conditional Lemma for Wasserstein distance) Let Wi, W2, Zi, Z2 be 

real-valued random variables with finite first moments. Assume that Zi and Z2 independent 
and that Wi is Q -measurable for some sub-a -field Q. Then: 

rfvy(Law {Wi + W2) , Law {Zi + Z2)) 

< rfvK(Law (Wi) , Law (Zi)) + E [dw(Lam {W2 \ G) , Law (Z2))] 

Remark 2 Here we are implicitly assuming that Law {W2 \ Q) is given by some regular 
conditional probability distribution 
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2.3 Continuous-time Markov chains 

2.3.1 State space and trajectories 

Let V be some non-empty finite set, called the state space. We write D = D([0, +00), V) for 
the set of all paths: 

u •.t>Ot-^ ojt eV 

for which there exist = to < < ^2 < • • • < < ■ ■ ■ with tn +00 and lo constant over 
each interval [t„,tn+i) (n G N). Such paths will sometimes be called cadlag. 

For each t > 0, we let : D — ?> V be the projection map sending u to Ut. We also define 
X = {Xt)t>o as the identity map over D. Whenever wc speak about probability measures 
and events over D, we will implicitly use the cr-field a {3) generated by the maps Xt, t > 0. 
We define an associated filtration as follows: 

J^t = (^{Xs : 0<s <t} (t > 0). 

We also define the time-shift operators: 

Ot : uj{-) e D ^ + T) e D (T > 0). 

2.3.2 Markov chains and their generators 

Let q{x,y) be non-negative real numbers for each pair (x.y) G with x ^ y. Define a 
linear operator Q : R^, which maps / G to Qf G satisfying: 

(Qf)(x)= Yl -/(?/)) (^eV). 

yev\{x} 

It is a well-known result that there exists a unique family of probability measures {Pxl^ev 
with the properties listed below: 

1. for all X G V, {Xo = x) = 1; 

2. for all distinct x, |/ G V, lime\o '^^'^"^""^^ = y); 

3. Markov property: for any a; G V and T > 0, the conditional law of X o 6t given 

under measure P^, is given by Px^- 

The family {Pa;}a;ev satisfying these properties is the Markov chain with generator Q. We 
will often abuse notation and omit any distinction between a Markov chain and its generator 
in our notation. 

For A G Mi(V), Fx denotes the mixture: 

xev 

This corresponds to starting the process from a random state distributed according to A. 
For a; G V or A G Mi(V) and F : D — )■ 5" a random variable, we let LaWj; (Y) or Law;^ (Y) 
denote the law of Y under P^ or P;^ (resp.). 
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2.3.3 Stationary measures and mixing 

Any Markov chain Q as above has at least one stationary measure tt G Mi(V); this is a 
measure such that for any T > 0, 

Law^ (X o 6t) = LaWjr (X) . 

We will be only interested in mixing Markov chains, which are those Q with a unique 
stationary measure that satisfy the following condition: 

Va e (0,1), 3T > 0,Vx G V : ciTv(Law (x) Xt, vr) < a. 

The smallest such T is called the a-mixing time of Q and is denoted by t'^iJ^o)- By the 
Markov property and the definition of total-variation distance, we also have: 

Va G (0, 1), Vt > t2i^(a), Vx G V,V events S, {X o e S) - {X e S) \ < a. 

t^j^ = ^mix(l/4) is also called the mixing time of Q. We note that for all e G (0, 1/2): 

4x(e)<Cln(l/6)4, (6) 

where C > is universal (this is proven in Section 4.5] for discrete time chains, but the 
same argument works here). 

2.3.4 Product chains 

Letting Q be as above, we may consider the joint trajectory of k independent realizations of 
Q: 

xf) = (X,(l),...,X,(fc)) (t>0) 

where each {Xt{i))t>o has law It turns out that this corresponds to a Markov chain 

qC^) on V'^ with transition probabilities: 

(fc)/^(fc) {k)^ ^ f (l{x{i),y{i)), if x{i) ^ y{i) AWj G [k]\{i}, x{j) = y{j); 
^ ^ ' I 0, otherwise. 

Remark 3 In what follows we will always denote elements of\^ (resp. Mi{V''') by symbols 
like x^''\y^''\ . . . (resp. \^''\ p^''^ . . . .) We will then denote the distribution of Q^''^ started 
from x^^-* or X^^^ byF^(k) orF^w. This is a slight abuse of our convention for the Q chain, 
but the initial state/ distribution will always make it clear that we are referring to the product 
chain. 

The following result on Q^'''^ will often be useful. 

Lemma 2.4 Assume Q is mixing and has (unique) stationary distribution tc. Then Q^^'^ is 
also mixing, and the product measure n®^ is its ( unique ) stationary distribution. Moreover, 
the mixing times of Q^'^^ satisfy: 

Va G (0, 1/2), i^ia) < t^Ja/k) < C ln(/./a) 4, 
with C > universal. 
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Proof: [Sketch] Notice that the law of Xj, has a product form: 

Law^(fe) (^X^^^ = Law^(i) (Xt) ® LaWa;(2) (Xt) ® ■ ■ ■ (8) Law^(A;) (Xt) ■ 

It is well-known (and not hard to show) that the total-variation distance between product 
measures is at most the sum of the distances of the factors. This gives: 

k 

ciTv(Law,(.) (X^')) ,7r®^) < J] rfTv(Law,(,) (X^) ,7r). 

i=l 

The RHS is < a if each term in the sum is less than a/k. This is achieved when T > 
^mixi^/^)'^ ® then finishes the proof. □ 



3 Nearly exponential hitting times 
3.1 Basic definitions 

We first recall a standard definition: the exponential distribution with mean m > 0, denoted 
by Exp(m), is the unique probabilty dstribution fi G Mi(M) such that, if Z is a random 
variable with law /i, 

F{Z>t) = e^*/*" (t > 0). 

We write Z =d Exp(m) when Z is a random variable with Law {Z) = Exp(m). 

Similarly, given m > as above and parameters a > 0, /3 G (0, 1), we say that a measure 
fi G Mi(]R) has distribution Exp(m, a, /3) if it is the law of a random variable Z with Z > 
almost surely and for alH > 0: 

(1 - a) e"(T^ <^(^Z >t^ < (1 + a)e"™^ 

We will write fi = Exp{m,a, (3) or Z =d Exp(m, as a shorthand for this. Notice that 
Exp(m,a,/3) does not denote a single distribution, but rather a family of distributions that 
obey the above property, but we will mostly neglect this minor issue. 

Random variables with law Exp(m, a, fi) will naturally appear in our study of hitting times 
of Markov chains. We compile here some simple results about them. The first proposition 
is trivial and we omit its proof. 

Proposition 3.1 (Proof omitted) If fi E Mi(]R) satisfies n = Exp(m, a,/3) and m' > 
0, 7 G (0, 1) are such that (3 + •y + (3'y < 1, 

(1 — 7) m' < m < (1 -|- 7) m', 

then 

/i = Exp(m', a,l3 + 'y + {3'^) 
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We now show that random variables Exp(m, a, (3) are close to the corresponding expo- 
nentials. 

Lemma 3.1 (Wasserstein distance error for Exp(m, a, We have the following in- 
equality for all a > 0, < (3 < 1: 

(ity(Exp(m), Exp(m, a, j3)) < 2{a + j3) m. 

That is, if Z =d Exp(m, a, the Wasserstein distance between Law (^Z^ and Exp(m) is at 
most 2am + 2(3 m. 

Proof: Assume Z —a Exp(m, a, j3) and Z —a Exp(m) are given. By convexity: 
dw{Z,Z) = J \¥\^Z >tj -e-^\dt 



< / max 1(1 + fQ;)+e~(i+W'" -e^'^ldt 
- Jo ee{-i,+i}" 



poo ^ 

< / \{l + a)e~'(^+P^ - e~^\dt 
Jo 

roo ^ 

+ a)+e^o^^ - e~^\dt 

Jo 

(/) + (//). 

For the first term in the RHS, we note that 

yt > 0, (1 + Q;)e~™^ - > 0, 

hence: 

poo 

(I) ^ / {{l + a)e~^^^ -e~^}dt^[a + P + aP]m. 
Jo 

Similarly, for term (77) we have: 

yt > 0, (1 - a)+ e~T^^ - e"- < 

hence: 

I'OO 

{II) = / {e~^ -{l-a)+e~^^^}dt<[a + /3-a/3]m. 
Jo 

Hence: 

dw{Z, Z) < (7) + (77) = 2{a + /3) m. 

□ 
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3.2 Hitting times are nearly exponential 

In this section we consider a mixing continuous-time Markov chain {P^^jajgv with generator 
Q, taking values over a finite state space V, with unique stationary distribution tt. Given a 
nonempty A C V with tt{A) > 0, we define the hitting time of A to be: 

HAico) = inf{t > : cu{t) E A} (to e D([0, +oo), V)). 

The condition 7t{A) > ensures that E^. [Ha] < +oo for all x G V. We will 

Our first result in this section presents sufficient conditions on A and /i G Mi(V) that 
ensure that Ha is approximately exponentially distributed. 

Theorem 3.1 In the above Markov chain setting, assume that < e < S < 1/5 are such 
that: 

P. {Ha < 4x(^e)) < 6e. 

Let t^{A) be the e-quantile of Law,, (Ha) , ie. the unique number t^{A) G [0, +oo) with 
[Ha ^ te{A)) = e (this is well-defined since Pj^ (i^A < ^) is a continuous and strictly 
increasing function oft in our setting). Given A G Mi(V), write: 



Then: 



Moreover, 



rx ^ Pa {Ha < t^^^Se)) 



LawA (Ha) = Exp ( O (e) + 2r,, 0(5)]. 



eE^ \H 



A\ 



0{5) 



and: 



UA) 

LawA [Ha] =d Exp (E^ [Ha] , O (e) + 2rA, O [6)) . 



We emphasize that results similar to this are not new in the literature [H [3], but the 
lower-tail part of our result does not seem to be explicit anywhere. The proof is strongly 
related to that in pp, but we wish to stress the relationship between the quantile t^{A) and 
the exponential approximation, which we will need below. 

The second result considers what happens when we have an union of events 

A = AiU A2U ■ ■ - U Ae. 

As described in the introduction, we give a sufficient condition under which the hitting time 
Ha behaves like a minimum of independent exponentials. 
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Theorem 3.2 Assume that the set A considered above can be written as: 

e 

A = \jA, 

i=l 

where the sets Ai, . . . ,A£ are non-empty and 

m := [Ha,] = [Ha,] = • • • = [Ha,] ■ 
Assume < S < 1/5, < e < 6/2i are such that for all 1 < i < i: 

y^e[i],F^ (i^A. < 4x(5e/2)) < |. 

Then for all A G Mi(V), 

Law, (Ha) = Exp 2r^ + O (ie) ,0{5 + o) 

where 

rx ^ Pa {Ha < 4.(5e)) , 

and: 

^ = J-e ^ P.(i^A. <em, ff^^, <em). 
i<i<j<e 

Remark 4 // the Ha^ are in fact independent, then ^ = O (ei) . 

The remainder of the section is devoted to the proof of these two resuhs. 

3.3 Hitting time of a single set: proofs 

We first present the proof of Theorem 13. II modulo two important Lemmas, and subsequently 
prove those Lemmas. 
Proof: [of Theorem 13.1] 

Let A G Mi(V) be arbitrary. Throughout the proof we will assume implicitly that 
S + rx + e is smaller than some sufficiently small absolute constant; the remaining case is 
easy to handle by increasing the value of Co if necessary. 

We begin with an upper bound for {Ha > t) in terms of t^{A). 

Lemma 3.2 (Proven in Section 13.3.11) Under the assumptions of Theorem \3.1\ 

\/t > 0, Pa {Ha >t)<{l + (e)) e ma^. 
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In particular, this implies: 

G Mi(V), [Ha] = / {Ha >t)dt<{l + (6)) (7) 

Jo ^ 

It turns out that the upper bound in the above Lemma can be nearly reversed if we start 
from some distribution that is "far" from A. 

Lemma 3.3 (Proven in Section 13.3.21) With the assumptions of Theorem \3.1\ if 2e + 
rx < 1/2; 

t(l+0(g))t 

Vt > 0, Pa (Ha > t) > (l - O (e) - rx)+ e . 

Notice that the combination of these two lemmas already implies the first statement in the 
proof, as it shows that: 

Wt > 0, Pa {Ha >t) e[{l-0 (e) - 2rA) e"(W))taA)^ (1 + (e)) e"(W))t.(A)]. 

To see this, notice that the upper bound is always valid by Lemma [3^ For the lower bound, 
we use Lemma [3.31 if 2e + ta < 1/2, and note that the lower bound is if 2e + ta > 1/2 and 
the constant in the O (e) term is at least 4. 

We now prove the assertion about expectations in the Theorem. We use Lemma 13.11 and 
deduce: 

[Ha] - e-' t,{A) I < dw{Law^ (Ha) , Exp(e-^ t,{A))) < O (S + r^) e'^ t,{A) 
and the assertion follows from dividing by e^^t^{A) and noting that 

= P^ (^Ha < 4x(^e)) < 6e 
by assumption. The final assertion in the Theorem then follows from Proposition 13.11 □ 

3.3.1 Proof of Lemma [32] 

Proof: Set T = t^^^{6e). We note for later reference that T < t^{A), since 

P. {HA<T)<6e<e = P^ {Ha < t,{A)) . 
Our main goal will be to show the following inequality: 

Goal : VA; G N, Pa {Ha > {k + l)t,{A)) < (1 - e + 26e) Pa {Ha > kt,{A)) . (8) 
Once established, this goal will imply: 

VA; G N, Pa {Ha > kt,{A)) < (1 - e + 26e)'' 

and 

Vt > 0, Pa {Ha > t) < e~'^^^^^^'^^ ={1 + (e)) e"(^+Wl{A) , 
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which is the desired result. To achieve the goal, we fix some G N and use T < t^{A) to 
bound: 

F^{Ha> ik + l%iA)) < F^{HA>kUiA),HAoe,t.^A)+T>UA)-T) (9) 
(Markov prop.) = ¥x{Ha > kt,{A))FA{HA > t,{A) - T) 

where A is the law of Xkt^(^A)+T comditioned on {Ha > kt^{A)}. Since this event belongs to 
J-'kt^(A) and T = t^i^{Se), A is 5e-close to vr in total variation distance. We deduce: 

— IP frj ^ 1+ f — <F^{HA>t,{A)-T) + de. 10 
Pa {Ha > kt,{A)) 

Now observe that: 

K{HA>tM)-T) < K{HA>tM)) + K{HAe{tM)~T,tM)]) 

< (Ha > UA)) + P^ (Ha o euiA)~T < T) 

(defn. of UA)) = l-e + W^{HAO QtM)-T < T) 

(vr stationary) = 1 - e + P^ (//^ < T) 

(T = t^j^(5e) + assumption) < 1 — e + 5e. 



< (l-e(l-25)), 



and plugging this into ffTOj) gives: 

Pa {HA>{k + l)UA)) 
Pa {Ha > kt,{A)) 

as desired. □ 



3.3.2 Proof of Lemma [SH 

Proof: The general scheme of the proof is similar to that of Lemma 13.21 but we will need to 
be a bit more careful in our estimates. In particular, we will need that (1 + 5S)e < 1/2 and 
2e + ta < 1/2. 

Define T = t^i^{Se) as in the proof of Lemma [3^2] in Section [3.3. 1[ Again observe that 
T < t,{A). Define: 

f{k)=¥,{HA>kt,{A)) {kEN). 

Clearly, /(O) = 1 and: 

/(I) > Pa {Ha o Qt > ^A)) -^x{Ha < T) > 1 - e ~ 6e - rx > 1 - 2e - rx (11) 
since T = t'^i^{Se) and by the properties of mixing times: 

Pa {Ha o Qt > U{A)) > K {Ha > UA)) - Se. 

We now claim that: 
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Claim 3.1 For all k e N\{0}, 

fjk + l) . ... 

Notice tliat tlie Claim and ( ITTj) imply: 

Vt>0, Pa {HA>t)>f{\t/tM)]) 

> (1 - 2e - ta) (1 - e - 55e)^^^"^ 
= (1 - O (e) - ta) (1 - e - 55e)^ > (1 - O (e) - ta) 6"^'+^^^^^^, 

which is precisely the bound we wish to prove. We spend the rest of this proof proving the 
Claim. 

Fix some k > 1 and notice that: 

fik + l) > fx{HA>k^A),HAoQkt4A)+T>tM)-T) 
-Pa {Ha > kt,{A), Ha o Qkt^) < T) . 
=■■ (I) -ill) (12) 
We bound the two terms (/), (//) separately. By the Markov property, 

(/) = Pa {Ha > kt^)) (Ha > tM)) 

where A is the conditional law of given Ha > ktt:{A). Since T = t^^^^^eS), A is 

within distance 6e from vr. We deduce: 

(/) > Pa {Ha > kt^)) (Ha > tM)) " = f{k) (1 - e - 6t). (13) 

We now upper bound term (/J) in f[T^ . Notice that (again because of the Markov property): 

(//) < P^ [Ha >{k- l)tM), Ha o G^^a) <T)= f{k - 1) Pa^ {Ha < T) 

where A' is the law of Xkt^i^A) conditioned on {Ha > {k—l)t^{A)}. Recalling that t^{A) > T = 
t^ix{5e), we see that A' is 5e-close to vr. Since we have also assumed that P^r {Ha <T)< 5e, 
we deduce: 

(//) < f{k - 1) (P. {Ha <T) + 5e) < 25e f{k - 1). 
We combine this with (1131) and f|T2l) to obtain: 

Wk e N\{0, 1} f{k + 1) > f{k){l -e-5e)- f{k - l){25e). 

One can argue inductively that f{k)/f{k — 1) > 1/2 for all k > 1. Indeed, this holds for 
> 2 by the Claim applied to — 1. For A; = 1 we may use (1111) and the assumption on 
2e + ta to deduce the same result. Applying this to the previous inequality we obtain 

VA; G N\{0} f{k + 1) > f{k){l - e - 55e), 

which finishes the proof of the Claim and of the Lemma. □ 
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3.4 Hitting times of a union of sets: proofs 

We present the proof of Theorem 13.21 below. 

Proof: [of Theorem 13.2] There are three main steps in the proof, here outhned in a shghtly 
oversimphfied way. 

1. We show that Theorem 13.11 is apphcable to the hitting times oi Ai, . . . , A^. In partic- 
ular, this shows that {HAi < e m) ^ e. 

2. We show that 

K {HA<em)^Yl (^A, < em) ^ ie, 

i=l 

SO that tee{A) ^ em. 

3. Finally, we apply Theorem 13. II to Ha and deduce that this random variable is approx- 
imately exponential with mean: 

[Ha] ~ U{A)/e^ ^ m/L 



The actual proof is only slightly more complicated than this outline. We begin with a 
claim corresponding to step 1 above. 

Claim 3.2 For alll<i<i, 

e,=K {Ha, < em) = {1 + (6)) e. 

Proof: [of the Claim] Consider some e' G [e/2,2e]. Notice that t[^i^(5e') < t^-^{5e/2) and 
therefore: 

P. {Ha. < t^,.{6e')) < I < 6e'. 

This shows that Theorem 13. II is applicable with Ai replacing A and e' replacing e. We deduce 
in particular that: 



V- < e' < 2e, 



e'E^ [Ha,] 



te'{Ai 



1 



<0{6 + e') = {6) . 



In particular, there exists a universal constant c > such that if e' < (1 — c5) e, then 
t^'{Ai) < eE^ [HAi], whereas if e' > (1 + c6) e, te'(Aj) > eE^ [HAi]- other words, 

(1 -c6)e< K {Ha, < eE^ [Ha,]) < (1 + c6)e. 



□ 
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We now come to the second part of the proof. 

Claim 3.3 Let ^ be as in the statement of Theorem VJ.'A Then: 

KiHA < em) = (1 + 0(5 + 0) ^e- 
In particular, there exists a number rj = {1 + O {6 + ^))ie with em = t,,(A). 
Proof: To see this, we note that: 

i 

{HA<em} = \J{HA,<em}. 

i=l 

The union bound gives: 

i 

P. [Ha < em) < 5^P^ {Ha., < em) < (1 + O {5)) le. 
1=1 

A lower bound can be obtained via the Bonferroni inequality: 

e 

{Ha < em) > J^P. {Ha, < em) 
1=1 

- J2 K{HA.,<em,HA^<em) 
i<i<j<e 
= (1 + 0(5 + O)^e 

using the definition of ^. □ 

We now need to show that the assumptions of Theorem 13.11 are applicable to Ha, with the 
value of r] in Claim [3131 replacing e. We assume that 5 + ^ is small enough, which we may do 
because otherwise the Theorem is trivial. In particular, we can assume that the O {C, + S) 
term in the expression for t] is between —1/2 and 1, so that: 

ei 

— <ri< 2ei. 

Since we also assumed e < 5/2£, we have rj < 5. Moreover, t^ixl^'?) — '^mix('^^/2)- This 
implies: 

P. [Ha < 4x(^^)) < (^^« ^ 4x('^e/2)) < e6e/2 < 6v. 

i=l 

We may now apply Theorem 13.11 (with rj replacing e) to deduce that for any A G Mi(V), 

LawA {Ha) = Exp(t^(A)/r/, O (r/) + 2rA, O {6 + 0). 
To finish tlie proof, we note tliat rj = O (£e), 

771 

t,{A)/rj = em/(l + O {6 + 0) ie = {1 + O {S + 0) j 
and apply Proposition 13.11 □ 
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4 Meeting times of multiple random walks 



We now put our two exponential approximation results to use, showing that the meeting 
times we are interested in are well-approximated by exponential random variables. Much 
of the work needed for this is contained in technical estimates whose proofs can be safely 
skipped in a first reading. 



4.1 Basic definitions 

For the remainder of this section, V is a finite set and Q is the generator of a mixing Markov 
chain over V with mixing times tmix(') stationary measure vr. For each k G M\{0, 1} 
we will also consider the Markov chains Q'-'^^ over V'^ that correspond to k independent 
realizations of Q from prescribed initial states, as defined in Section 12.3.41 We will also 
follow the notation from that section. 

For /c = 2, we define the first meeting time: 

M = iYii{t>Q:Xt{l)=Xt{2)} (14) 

and the parameters: 

m(g) = E,»2 [M] , (15) 

We also define an extra prameter err{Q) which will appear as an error term at several 
different points in the paper. Ths parameter err{Q) is defined as 

err(Q) = cq a/ p{Q) ln(l/p((5)) if Q is reversible and transitive. (17) 
For other Q, we define it as: 



err(Q) = Ci . (1 + q^^^Jl^) vr^ax In — . (18) 

^ +^°^^^^mix)^max/ 

The numbers Cq, Ci > are universal constants that we do not specify exphcitly. We choose 
them so as to satisfy Propositions 14. H 14.41 and 14.51 below. 

We now take k > 2 and consider the process Q^''\ with trajectories 

(Xf = (X,(l),X,(2),...,X,(fc))),>o 

corresponding to k independent realizations of Q (cf. Section [2.3.4p . This has stationary 
distribution vr*^*^. 

We write M^'^^ for the first meeting time among these random walks: 

= inf {t > : 31 < « < j < A;, Xt{^ = Xt{j)} . (19) 
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One may note that 

M^*^) = min Mi 

where ('2^) was defined in Section |2] and, for 1 < i < j < k: 

= M,,, = mi{t > : Xt{t) = Xt{j)} (20) 
is distributed as M for a reahzation of Q*^^-* starting from (Xo(i), Xo(j)). 

4.2 Technical estimates for reversible and transitive chains 

In this subsection we collect the estimates that we will use in the case of chains that are 
reversible and transitive. 

Proposition 4.1 Assume Q is reversible and transitive and define err(Q) accordingly. If 
err{Q) < 1/4, then: 

(m < 4x(err(g)')) < en{Qf. 

Remark 5 The proof is entirely general, but we will only use this estimate in the transi- 
tive/reversible case. 

Proof: We will prove a result in contrapositive form: if0</3<l/4is such that: 

(m < 4,(/3)) > /3, 

then (3 < clp{Q) \n{l / p{Q)) for some universal Cq > 0. 
Notice that for any x^"^^ G V^, 

p,(.) (m > 4x(/3/4) + 4x(/3)) < {m o e,Q_(^/,) > t^M) 

< P.«. (m > t« J/3)) + /3/2 

< (1-/3/2), 

where the middle inequality follows from the fact that t2j^(/3/4) is an upper bound for the 
/3/2- mixing time of Q*-^-* (cf. Lemma [2.41) . A standard argument using the Markov property 
implies that for any G N, 

P,(.) (m > k (4,(/3/4) + 4,(/3))) < (1 - /3/2)^ 

so that: ^ ^ 

m(Q) = E^»2 [M] < C ^mix(/3) + W(/3/4) _ 

Since ^ (71n(l/a) t^^^, we deduce that: 

^h^ < 

with c > universal, which implies the desired result. □ 
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We now prove an estimate on correlations. 

Proposition 4.2 Assume Q is transitive. Then for all t,s > and {i,j}, {i,r} C V with 
{^J}y^{r,i}, 

F^m {Mij < t, Me,r < s) < 2P^®2 (M < s) Km {M <t) . 

Proof: If {z, j} n {i*, r} = 0, the events {Mjj < t} and {M^^r < t} are independent. Since 
the laws of both Mjj- and M^^^ under tt®^ are equal to the law of M under vr®^, we obtain: 

P^®fc (Mij < t, Mi^r <s)< P^®2 (M < s) P^®2 (M < t) 

in this case. Assume now {«, j} H r} has one element. Without loss of generality we may 
assume /c = 3, {«, j} = {1,2} and r} = {1,3}. We have: 

(Mi,2 < t, Mi,3 < s) < P^«3 (Mi,2 < t, Mi,3 o Bm^.^ < s) 

+P^03 (Mi,3 < s, Mi,2 o 9^1,3 < t) . (21) 

Consider the first term in the RHS. By the Markov property: 

P^»3 (Ml,2 < t, Ml,3 O eAfi,2 < S) = P^«2 (M < t) P;,(2) (M < s) 

where A*^^) is the law of 2(1)) -^Afi 2(3) conditionally on Mi 2 < t. Since (Xi(3))t is 
stationary and independent from this event, A*^^) = A ® tt for some A G Mi(V) which is the 
law of XAf^2(l) under P^®2. The transitivity of Q (which implies that vr is uniform) implies 
that A = TT and therefore: 

P^®3 (Mi,2 < t, Mi,3 o 6^/1,2 <s)= P^»2 (M < t) P^®2 (M < s) . 

The same bound can be shown for the other term in the RHS of i^T^, and this implies the 
Proposition. □ 



4.3 Technical estimates for the general case 

We will need the following general result: 

Proposition 4.3 For any A G Mi(V) and T > 

Fa®. (M < T) < (1 + 2Tg^ax) W- 

Proof- Let (X^) t be a single realization of Q. One may imagine that the trajectory of {Xt)t>Q 
is sampled as follows. First, let P be a Poisson process with intensity g^ax independent from 
the initial state Xq. At each time t & V, one updates the value of Xf as follows: if = x 
for s immediately before t, one sets: 

g(x, ?/) 

Xt = y with probability — {y G V\{x}) 

Q'max 
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and Xt = X with the remaining probabihty. This imphes that, at the points of the Poisson 
process, Xt is updated as in the discrete-time Markov chain with matrix P = (/ + Q/gmax), 
and it is easy to see that vr is stationary for this chain. 

Now let Xt{l),Xt{2) be independent trajectories of Q, with Xt{l) started from A and 
Xt{2) started from the stationary distribution vr. We will imagine that each Xt{i) has its 
own Poisson process V{i) and was generated in the way described above. It then follows 
that: 

Pa^.(M<T|P(1),P(2)) < Pa«. (Xo(l) = Xo(2)) 

+ Pa«.(X,(1) =X,(2) I P(1),P(2)) 

te{v(i)uv{2))n{o,T] 

since the processes can only change values at the times of the two Poisson processes. At 
time we have: 

Pa^. (Xo(1) = Xo(2)) = ^ X{x)'k{x) < 

max- 

For t e V{1) U V{2), the law of Xt{l),Xt{2) equals: 

where ki = \V{i) fl (0,t]| {i = 1,2). Crucially, vr is stationary for P, hence vrP'^^ = vr and we 
obtain: 

Pa55. (Xtil) = Xti2) I P(1),P(2)) = 5^(AP'^)(a;)7r(x) < w 

xGV 

as for t = 0. We deduce: 

Pa^. (M < T I V{1),V{2)) < (1 + \{V{1)UV{2)) n (0,T]|)7r^,.. 
The Proposition follows from taking expectations on both sides and noticing that: 

E[|(P(l)UP(2))n(0,T]|] = 2Tg^,.. 

□ 

We now prove an estimate corresponding to Proposition 14.11 in this general setting. 
Proposition 4.4 Assume err{Q) is as defined in / f7^) . Then: 

(m < 4x(err(g)')) < erriQ)'. 
Proof: The previous proposition implies: 

P.«2 (m < 4x(err(g)')) < (1 + 24,(err(g)) g^^x) W 

^ C (1 + 2i(:^j^ ^max) TTmax 

ln(l/err(g)). 

This is < err(g)^ by definition of this quantity, if we choose ci in ( !T8|) to be large enough. 
□ 
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We now prove an estimate on correlations that is similar to Proposition \A.2\ but with an 
extra term. Recall that ('2^) was defined in Section 12.11 

Proposition 4.5 For any mixing Markov chain Q, if one defines err{Q) as in fT8\} . we have 
the following inequality for k > 3 and all distinct pairs {i,j}, {i, r} G ('2 ) 

F^m {Mij < t, Mi^r <s) < 2P^®2 (M < t) F^s>2 {M < s) + O {en{Q f) . 

Proof: The case {i,j} H {i,r} = follows as in the proof of Proposition 14.21 In case 
n {i,r} has one element, we may again assume that i = i=l, j = 2 and r = 3. 
Equation (1211) still apphes, so we proceed to bound: 

P.«3 (Mi,2 < t, Mi,3 o Ba/i,, < s) , 

which is upper bounded by: 

P.«3 (Mi,2 < t, Mi,3 o Qm,,, <s) < (Mi,2 < t, Mi,3 O Qm,., < 4x(^)) 

= (/) + (//) """^ (22) 

for some G (0, 1/4) to be chosen later. 
Term (/) is equal to: 

P.«3 (Mi,2<t)PA(2) (m< 4,(77)) 

where A*^^) is the law of (Xa/i_2(2), -^Mi,2(3)) conditionally on {Mi^2 < 0- in the previous 
proof, {Xt{3))t is stationary and independent from the conditioning, hence A*^^-* = A (g) vr for 
some A G Mi(V). We use Proposition 14.31 to deduce: 

(/) < P,«3 (Mi,2 <t)0 ((1 + t^i,(r7) g^,,) 
The analysis of term {II) is simpler: we have 

(//) = P,«3 (Mi,2 < t) Pa.®. (M < s) 

for some A,,, G Mi(V) which is the law of Xj^ ,q . , conditionally on {Mi 2 < t}. The time 
shift by tmix(^) implies that A,,, is //-close to stationary, hence: 

(//) < P^«3 (Mi,2 <t){ri + P,»2 (M < s)). 

We deduce from ([22]) that: 

P.«3 (Mi,2 < t, Mi,3 O eA/,,2 <S) < (/) + (//) 

< P^»2 (M < t) P^®2 (M < s) 

+P,«2 (M < t) O (r/ + (1 + t1,M W) w)(23) 
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Recall Vixl'?) — ^ '^mix ^^i^/v) some universal C > 0. If 

?70 = (1 + gmax^mix) ^Tmax < 1/2 

we may take rj = rjQ to obtain: 

P^»3 (Mi,2 < t, Mi,3 o eAfi,2 < s) < P^®i (M < t) P^®2 (M < s) 

+Km {M<t)0 In ^ . 

The case of r/o > 1/2 is covered "automatically" by the big-oh notation. 

An analogous bound can be obtained with the roles of (t, 2) and (s, 3) reversed. Plugging 
these into ([5T]) gives the desired bound. □ 

4.4 Exponential approximation for a pair of particles 

We now come back to the setting of Section |4?T] and show M is approximatelly exponentially 
distributed. 

Lemma 4.1 Define err(Q) as in (i'f Q is reversible and transitive) or as in (if 
not) and assume that err(g) < 1/10. Then VA^^) g Mi(V(2)); 

Law,(.) (M) = Exp(m(g), O (err(g)) + 2P,(.) (m < tL(err(Q)')) , O (err(Q))). 

Proof: This is a direct application of Theorem 13.11 to the hitting time of the diagonal set 

A = {{x,x) : X G V} C 

by the chain with generator Q^^-* defined in Section [2] and with e = err((5), 5 = 2err{Q). All 
we need to show is that: 

(M<t£'(5e)) <e5 
where t^^^{-) denotes the mixing times of Q*^^-*- This inequality follows from 

^£(2err(Q)2) < 4x(err(g)^) (Lemma El 

and 

Km (m < 4x(err(g)')) < eniQf < 5e, 

which follows from Proposition 14.11 in the reversible/transitive case and Proposition 14.41 in 
the general case. □ 
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4.5 Exponential approximation for many random walkers 

We now consider the more complex problem of bounding the meeting times among k > 2 
particles. We take the notation in Section 14.11 for granted. 

Lemma 4.2 Let i = > and assume that the quantity err{Q) defined in [T7\ ) (if Q 
is reversible and transitive) or as in ( fl^) (if not) satisfies en{Q) < 1/lOi. Then for all 
\^^^ G Mi(V^'); 

Law,(., (M('=)) = Exp (^:!lM,o (fc2err(Q)) +2r,(.),0 (fc2err(g)) 

where r^w = PaW (m« < 4x(err(Q)2)) . 

Proof: M^'^^ is the hitting time of a union of i sets. 

A« = U A|,,,} where A^,,,} = {x^^) G V'^ : = 

We will apply Theorem 13.21 applied to the product chain Q*-^^ to show that this hitting time 
is approximatelly exponential. We set S = 2£en{Q), e = err(Q) and verify the conditions of 
the Theorem: 

• < 6 < 1/5, < e < 6/2i: These conditions follow from err{Q) < 1/lOi. 

• P^®fc (^Mij < ^mix'(^^/2)j < Se/2. To prove this we simply observe that: 

t'^^{5e/2) <t1.Jen{Qf) (Lemma O and defn. of e, 5) 

and that: 

P.«. (m < tL(err(Q)^)) < eniQ)' = | < | 
by Proposition 14.11 (in the reversible/transitive case) or by Proposition 14.41 (in general). 



m((5) is the same for all {i,j} G (2'): this is obvious. 



The Lemma will then follow once we show that the ^ quantity in Theorem 13.21 satisfies: 
^ = Y^ J2 (^{M} < em(Q), M^e,r} < em(Q)) = O (fc'err(g)) . 

{i,j}^{e,r} in C^l) 

To start, we go back to Claim [372] in the proof of Theorem 13.21 and observe that, whenever 
the assumptions of that Theorem hold, 

F^m {M^ijy < em(Q)) = O (e) . (24) 
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Now note that Proposition 14.21 (in the reversible/transitive case) and Proposition 14.51 (in the 
general case) imply that each term in the sum defining ^ is O (err(Q)^). We deduce: 



e < < O (ie) = O err(g)) . 

□ 



5 Coalescing random walks: basics 

In this section we formally define the coalesing random walks process. We then show that, if 
the initial number of particles is not large, mean field behaviour follows from the exponential 
approximation of meeting times. 

5.1 Definitions 

Fix a Markov chain Q on a finite state space V. Given a number k G [|V|]\{1} and an 
initial state x*-*^-* G V*^, consider a realization of Q{k) 

(X('=))i>o = (Xi(l),...,X,(fc)),>o. 

We build the coalescing random walks process from X^''^ by defining the trajectories of the 
k walkers one by one. We first set: 

Xt{l) = Xt{l),t>0. 

Given j G [/c]\{l}, assume that Xt{i) has been defined for all 1 < i < j and t > 0. We let 
Tj be the first time t > at which Xt{j) = Xt{Ij) for some 1 < Jj < i, and then set: 

^^^'^ = \ MI,), t > T, 

Intuitively, this says that as soon as j encounters a walker with lower index, it starts moving 
along with it. The process: 

{Xf\>, ^ (Xtij))t>o 
is what we call the coalescing random walks process based on Q, with initial state x^''\ 

Remark 6 For any j > 3, there might be more than one index i < j such that Xt {i) = 
XTj{j). However, it is easy to see that all such i will have the same trajectory after time 
Tj, because they must have met by that time. This implies that there is no ambiguity in the 
definition of Xt{j) for any j . 

We also define: 

Q = inf{t>0 : \{Xtij) : J E [k]}\ < t} 
and C = Ci. The fact that we are working in continuous time implies: 
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Proposition 5.1 (Proof omitted) Assume that the initial state x^^'^ = (x(l), x(2), . . . , x{k)) 
is such that x{i) ^ x{j) for all 1 < i < j < k. Then Cj. = < Cj^-i < Cfc_2 < ■ ■ • < Ci 
almost surely. 

It is sometimes useful to view ttie coalescing random walks process as a process with 
killings. Define a random 2l'^l-valued process {At)t>o as follows. 

• 1 e At for all t; 

• proceeding recursively, for each j G [A;]\{1}, we have j G At if and only if Tj > t, where 
Tj is the first time t at which Xt{i) = Xt{j) for some i < j with i G At. 

Intuitively, At is the set of all walkers that are "alive" at time t > 0, and a walker dies 
at the first time it meets an alive walker with smaller index. One may check that coalescing 
random walks is equivalent to the killed process in the following sense. 

Proposition 5.2 (Proof omitted) We have tj = Tj for all j G [A;]\{1}. Moreover, for all 
t > we have: 

{Xtij) : J e At} = (XtU) : je[k]}. 

Finally, for all i & [k — 1] 

Q = inf{t > : \At\ < i}. 

Recall that Mij is the meeting time between walkers i and j (cf. fl20|) ). We have the 
following simple proposition. 

Proposition 5.3 (Proof omitted) Assume that the initial state 

x^^'^ = {x{l),x{2),...,x{k)) 
is such that x{i) ^ x{j) for all 1 < i < j < k . Then for each I < p < k — 1, 

Cp - Cp+i = min Mij o Qr+i- 

Moreover, each time Cp equals Mij for some {i,j} G {^'^) ■ 

5.2 Mean-field behavior for moderately large k 

We now prove a mean-field-like result for an initial number of particles k that is not too 
large, assuming that meeting times of up to k walkers sasitfy our exponential appoximation 
property 
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Lemma 5.1 Assume that Q, err((5) and k satisfy the assumptions of Lemma \4-S\ Let 
^ Then for all p e [k - 1] 



0(Perr(g)) + 12r/(x('=)) 
p 



where 



+ P,,.) ( 3{t,j},{i,r} G , 2 /' L-' 



^^^^ ^^r} ^ {z, j} but M,, o Gm,. < tL(err(g)2; 

anc? i/ie Zj are the random variables described in (Qp. 
Proof: Write = {x{l),...,x{k)). We will prove the similar bound: 
"VI <i<3 <k : x{i) ^ ^ 



(iiy I Law2,{fc) 



(25) 



To see how this implies the general result, consider some x*-^-* such that some of its coordinates 
are equal, so that in particular r]{x^^^) > 1. One still has the trivial bound: 



dw ( Law^(fc) ( -fer ) ' -'^^^ ( 5Z ^0 1 - ^''^"^ 



\i=p+l 



m(g) 



+ E 



■ k 

.i=p+l 



The second term in the RHS is < 2/p. For the first term, let y^^^ = {y{l), ■ ■ ■ ,yij)) G V-' 
have distinct coordinates with: 



{y{l),...,yij)} = {xil),...,xik)}. 



Then clearly: 



Lm(g)j 



[m(g) 



If P > J the RHS is 0. If not, it can be upper bounded using the Lemma: 

k 



Lm(g)j 



< E 



i=p+l 



+ dw I LaWj^(,) 



< 



2 + 4r7(?/W) + 0(fc2err(g)) 



p 
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Since ri{x^^^) > 1 > ri{y^^^)/2 in this case, we obtain 



dw Law, 



(k) 



< 



12r7(x(^)) + 0(A;2err(g)) 



P 



for such x^^'' with repetitions, which gives the Lemma in general. 

We prove (!25|) by reverse induction on p. The case p = /c — 1 is triviah C^-i is simply 
M(*=) and r](x^^^) is an upper bound for rs ... , so we may apply Lemma to deduce the 
desired bound. 

For the inductive step, consider pq < k — 1 and assume the result is true for all po < 
p < k — 1. We will use the easily proven fact that Cp^+i is a stopping time for the process 
(Xj'^'*)j>o process. Consider the corresponding cx-field J-'c , i. We will apply Lemma [273] with: 



Wi 
Q 



i=po+2 

c 



-po+1; 



PO + 1 



m(Q)' 



m(g) 



m(Q) 



(We used Proposition l5.3l to obtain the second expression for W2 above.) Applying Lemma 
in conjunction with the induction hypothesis gives: 



d\\r I Law^{fc) 



m(Q) 



i=Po+l 



< 



0(Perr(Q)) +4r7(xW) 



dw I Law^(fc) , m{Q) 



(26) 



Note that ^Cp^+i is J^Cpn+i-measurable. The strong Markov property for Q^^'^ implies: 



-PO 



Law^{fc) 



m(Q) 



Law 



(fc) 



Now define as the vectors whose coordinates are the po + 1 distinct points Xcp^^^J 

with i G Apo+i (^^6 order of the coordinates does not matter). Clearly, 



Law 



X, 



(fc) 

So+i 



Law 



y{po+i) 



]\^(po+l) 

m(Q) 



(27) 
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By Lemma [4. 2[ this last law satisfies: 



7\^(po+i)\ / I 

Lawy(po+i) ( ) = Exp I j^;^,0{Pen{Q)) + 2rs^^^^^^^,0 en{Q)) 



m(g) 

and Lemma [3.11 gives: 



aw -Lawy(po+i) — , ^po+i I S 



m(Q) 



Po(Po + 1) 



Using the definition of rg , we obtain from f l26p the following inequality: 



d\Y I Law^{fc) 



C 



m(Q) 



V{po+i) 



'i=Po+l 



< 



0(Perr(Q)) +4r7(x('=)) 



+ 



0(fc2err(g))+4E,(„ (m^w+D < 4,(err(g)2 



Po(Po + 1) 



■• (28) 



To finish, we need to show that the expected value in the RHS is < r]{x^^'^). For this we 
recall (1271) to note that 



P 



X, 



(fe) 



min M,,, < tL(err(g)=^ 



(Prop. E31+ strong Markov) = P.^ (Cp„ - Cp„+i < t^i,(err(g)2) | J^c^^^^ 
Averaging gives: 

E,,.) [Py„o+i) (m(^"+i) < 4x(err(g)'))] = P.,.) (c^, - C,,+i < 4x(err(g) 
and Proposition 15.31 implies: 



P,(.) (C,„ - C,„+i < tL(err(g)')) < P.(.) ( IJ {M,, o Gm,,. < 4x(err(g)')} 



Since the RHS is < r]{x^^'^), we are done. □ 



6 Proofs of the main theorems 

6.1 The full coalescence time in the transitive case 

In this section we prove Theorem 11.11 
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Proof: [of Theorem 11.1] Recall that C = Ci by definition. Lemma 15.11 gives the following 
bound for any k < ^/T/4en{Q) A |V| and x'^''^ G 

dw (law.w (j^^ ^ 12r7(x(^-)) + (A;W(g)) . 



Notice that: 
hence: 

dw I Law 



i=2 i=2 / U>fc+1 



k+1' 



^» (^)-Ez.)=12'Kx'")+o(..W(Q) + i) 



Convexity of dw implies: 

Proposition 6.1 Under the assumptions of Theorem li.il if err {Q) < 1/4, the following 
holds for k < ^l/Aerr{Q) A |V| and X^''^ G Mi(V''): 

dw I^Law,,., , E y r/(x«) d\^'\x^'^) + O (^k'erriQ) + . (29) 

Notice that our control of Ci gets worse as k increases, and we cannot use the above bound 
to approximate the law of Ci started with one particle at each vertex of V. What we use 
instead is a truncation argument combined with the Sandwich Lemma for dw (Lemma 12.21 
above). For this we need to find two random variables 

such that both C_/m{Q) and C+/m(Q) are close to More specifically, we will show 

that: 

dw (^-^, E = ^ ^^^(^) + ^'err(Q)2 + ^ + p(Q) ln(l/p(Q))^ . (30) 

Before we continue, let us show how this last bound implies our result. Lemma 12.21 then 
gives: 

dw (^-^, = ^ "^^(^) + ^'err(g)^ + ^ + p(Q) ln(l/p(g))^ . 

Since p{Q) \n(l/ p(Q)) = O (err(Q)), we may choose k = (err{Q))^^/^ (which works for err(Q) 
sufficiently small) to obtain: 



d 



Ci 



w 



5:Z.)=0(err(g)V3) 

i>2 / 



m(Q)'f_ 
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and this is precisely the bound we seek because err(Q) = O ^V' P{Q) l^(l/p(Q)) j • We now 
construct C_, C+ and prove that they have the required properties. 

Construction of C_: pick . . ■,x{k) G V from distribution tt, independently and 

with replacement. Let C_ denote the full coalescence time for k walkers started from these 
positions. This might be degenerate: there might be more than one walker starting from 
some element of V, but this only means those particles will coalesce instantly. 

Clearly, C_ :<d Ci. Moreover, 



Law I — — — I = Law. 



Ci 



Therefore by Proposition 16. ![ 



i=2 / \ \ 

o( [ r7(x(^')) dn^^x^'"'^) + e err(g) + 



Notice that the integral in the RHS is at most: 
J2 (^.i ^ ^mix(err(g)' 



+ ^-^^ {^^'^ ° ^ 4x(err(g)')) = O (k'eniQf) . (31) 

as can be deduced from the proofs of Propositions 14.21 and 14.11 We conclude that: 

dw ^Law (;^) > Law ) = ^ (fc'err(Q)2 + k' err(Q) + . (32) 

Construction of C+: we will use the following simple stochastic domination result, 
which we describe in the language of the process with killings. Let r < a be stopping times 
for the X'-'^-* process. If all killings are supressed between time r and a, the resulting full 
coalescence time C+ stochastically dominates Ci. We will use this result, whose proof we 
omit, with the following choice of r and a: 

r = Cfe and (T = Cfc + t^ix(ei'r(Q)^). 

Lemma 12.11 implies: 



m(g)' m(g) ; - m(g) m(g) m(g) 
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Since Q is transitive, nn(Q), can be bounded from below in terms of the maximal hitting 
time in Q [H Chapter 14]. Theorem 1.2 in [13] implies: 

E|Ql<^ + CtL 
for some universal C > 0. Recalling the definition of p{Q) in flTB]) . we obtain: 

E[CJ „/l 



Moreover, we also have 

4x(err(g)^) = O (ln(l/err(g)) 4,) = O (t^,, ln(l/p(g)) 

hence: 
This shows: 

Now consider the time Ci o Q^. Since all killings were supressed between times t = and 
cr = Cfc + t'^i^{^ff{QY), there are alive particles at time Letting A^^^ denote their law, 
we have: 



Law ( — I = Law;^(fc) ( — tttt I 



m(g) ; \HQ)J 



and Proposition 16.11 implies: 



i=2 

Now observe that 



4x(err(g)^)>tr(^err(g)2) (cf. Lemma 

hence the law of the k particles at time + t^;^(err(g)^) is fcerr(g)^-close to stationary, 
irrespective of their states at time C^. We deduce that A*^'^^ is A;err(g)^-close to stationary, 
and deduce: 
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The integral in the RHS was estimated in (13T]) . and we deduce: 



dw (^%^' E = O (k' en{Q) + k'err{Qf + 



and we deduce: 

k 



dw f iy-i\=0 [k' err(g) + k'err{Qf + + p(g) ln(l/p(g))^ . 



□ 



6.2 The general setting 

We now come to the proof of Theorem II. 2[ 

Proof: [of Theorem 11.2] The proof is essentially the same as in the reversible/transitive case, 
butwi th the definition of err(g) given in ( |T8|) . In particular, we can still use the same 
definition of C_ used in that proof to obtain. 

dw ( y ]=0[^' err(g) + A;^err(g)^ + ) . (33) 



Y^=0 (k' err(g) + k'eniQf + . 



m(g)'f 

We will need a different strategy in the analysis of C+, where we need to bound E [C^] 
by different means. Note that Cj^ > t if and only if there exist distinct y{l), . . . ,y{k) G 
V such that there is no coalescence among the walkers started from these vertices. The 
probability of this "no coalescence event" for a given choice of y{i)^s is P^(fc) (M^'^) > t) for 
yik) = {y{l), . . . , y{k)). Therefore, 

p(Cfc>t)< I V)(^^'^>0 Al. 



By Lemma 14.21 each term in the RHS satisfies: 



Py(fe) {M^^^ >t) <Ce (l + 0('=^err{Q)))nn(Q) 

for some universal C > 0. Since there are < IVI^ terms in the sum, we have: 



P (Cfc >t)< I Vp e (i+o(fc^-(Q)))r.(Q) j /\ i_ 
Integrating the RHS gives: 



m(g) 
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for a potentially different, but still universal C. Going through the previous proof, we see 
that this gives: 



To continue, we bound the term containing imix(6i'i'(Q) ) in terms of err(Q) (this was easier 
before because of the different definition of err((5)). Recall from Proposition 14.41 that: 

P.«2 (m < tL(err(g)')) < err{Qf. 

Therefore, for all j € N, 

i 

P.«^ (m < J tL(err(g)')) < ^-^^ ° 0(i-i)e.{err(Q)^) < 4x(err(g)')) < jerr(g)2. 
On the other hand, taking 

2E^®2 [M] 



^L(err(g)2)' 
we obtain 

{m < , tU^niQ)-)) > 1 - ^E£iM_ > 1 

Combining these two inequalities gives: 



= O (err(«)') . 

E^»2 [M] ^ ' 

This implies that the term containing t'^\Jy&^t{QY) in the RHS of flMl) can be neglected. 
Combining that equation with f l5^ and the Sandwich Lemma 12.21 we obtain: 



d 



Ci 



g Z, j = O err(g) + fc^err(g)2 + 



m(g) 

We choose k = O ((In |V|/err(g))^/'^) to finish the proof, at least if this is smaller than 

l/5A/err(g). But the bound in the Theorem is trivial if that is not the case, so we are done. 

□ 



7 Final remarks 

• Cooper et al. [6J consider many other processes besides coalescing random walks. It is 
not hard to modify our analysis to study those processes over more general graphs, at 
least when the initial number of random walks is not too large (this restriction is also 
present in [6]). 
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• Our Theorems 13.11 and 13.21 can be used to study other problems related to hitting 
times. Alan Prata and the present author [TH] have used these results to prove the 
Gumbel law for the fluctuations of cover times for a large family of graphs, including 
all examples where it was previously known. We have also used extensions of these 
results to compute the asymptotic distribution of the k last points to be visited, for any 
constant k: those are uniformly distributed over the graph, as conjectured by Aldous 
and Fill g]. 

Acknowledgement: we warmly thank the anonymous referee for pointing out many typos 
in a previous version of this paper. 

A Proofs of techncal results on Li Wasserstein distance 

A.l Proof of Sandwich Lemma (Lemma 12^2]) 

Notice that for all t G M, 

F{Z_>t)<F{Z>t)<F {Z+ > t) . 

By convexity, this implies: 

\¥{Z >t) - W{W >t)\<\¥{Z_>t) -¥{W >t)\ + |P {Z+>t)-¥{W>t)\. 
Integrate both sides to obtain the result. 

A. 2 Proof of Conditional Lemma (Lemma 12.31) 

First notice that the sigma field a{Wi) generated by Wi is contained in Q. This implies that 
for all t G M: 

W.[\¥{W2>t\g)-¥{Z2>t)\] = E[E[\F{W2>t\g)-F{Z2>t)]\a{Wi)] 

> E[\F{W2>t\a{Wi))~F{Z2>t)\]. 

Integrating both sides in t and applying Fubini-Tonelli gives: 

E [dw(Law {W2 I Q) , Law (Z2))] > E [^^'(Law {W2 \ cr{Wi)) , Law (Z2))] . 

Therefore it suffices to prove the Theorem in the case Q = a{Wi). For simplicity, we 
will assume that {Zi, Z2, Wi, W2) are all defined in the same probability space, with {Zi, Z2) 
independent from {Wi, W2)- Let / : M — )■ M be 1-Lipschitz. We have: 

E [f{Wi + W2) \ Wi = wi]= [ f{wi + W2) P {W2 G dw2 I Wi = wi) . 
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By the duality version of dw, we have: 
J f{wi + W2) P {W2 e dw2 I Wi = w^) 

< J f{wi + Z2) P {Z2 e (i;Z2) + c?w(Law (W^2 | H^i = Wi) , Law (Z2)) 

Integrating over Wi = wi and using the fact that Z2 is independent from Wi, we obtain: 

E [f{Wi + W2)] < E [f{Wi + Z2)] + dw{hB?N {W2 I Wi) , Law (Z2)). 
But we also have: 

E [/(W^i + Z2) I Z2 - ^2] = E [/(W^i + ^2)] < E [/(Zi + ^2)] + dwiWi, Zi), 
and the independence of Zi, Z2 implies: 

E [f{W, + Z2)] < E [/(Zi + Z2)] + dw{Wi, Zi). 

We conclude: 

E [f{W^ + W2)] < E [/(Zi + Z2)] + Zi) + (iiy(Law (M/2 | W^) , Law (Z2)). 

Since / is an arbitrary 1-Lipschitz function, we are done. 
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